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CURRENT DESIGN 


ALLOCATOR'S SPOT IN THE ECOSYSTEM 


Application: Application: 


Wayland EGL Client 


Vendor1 Drv 
Vendor2 Drv ION Drv 
DRM Userspace 


Kernel 


Device 1MDevice 2#Device 3 ION 
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ALLOCATOR OBJECTS 


ASSERTION USAGE 


CONSTRAINT CAPABILITY 
CAPABILITY SET 
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CURRENT WORKFLOW 


Based on USAGE.md on github project 


. Initialize an allocator device from a device file descriptor 

. Query capability sets from the device given an assertion and list of usages 

. [Optional] Query capability sets from additional devices with the same parameters 
. [Optional] Merge capability sets of desired devices to find common capabilities 

. Try allocating a surface on available devices until allocation succeeds 


. Import surfaces to graphics APIs, mode setting APIs, video APIs, etc. 
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PROTOTYPE STATUS 


SUPPORTED/PLANNED FUNCTIONALITY 


Goal is to Encourage and Substantiate Design Discussion 


Creating Devices - IMPLEMENTED 

Querying Capabilities and Constraints - IMPLEMENTED 

Merging Capabilities and Constraints - IMPLEMENTED 

Creating Allocations from Capabilities and Constraints - IMPLEMENTED 
Exporting/Importing Allocations - TODO 

Using Allocations in Vulkan/OpenGL - TODO 

Using Allocations in DRM/Non-Graphics Devices - TODO 


8 «9XnVIDIA. 


CAPABILITY SET MATH 


Core of the Design 


Current set derivation algorithm: merge/union constraints, intersect capabilities 
Capabilities can be "required". If operation removes a required capability, it fails 


Needs more validation. Throw your worst usage/constraints/capabilities at it! 
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CAPABILITY SET MATH EXAMPLE 


Constraints: 


1. Address aligned to 32B 
Capabilities: 
1. NVIDIA tiling/layout (*) 


2. NVIDIA FB compression 


Constraints: 
1. Address aligned to 64B 
Capabilities: 


1. pitch-linear layout (*) 


Constraints: 

1. Address aligned to 32B 
2. Pitch aligned to 64B 
Capabilities: 

1. Pitch-linear layout (*) 


2. Dev2 FB compression 
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CAPABILITY SET MATH EXAMPLE 


Constraints: Constraints: 

1. Address aligned to 32B 1. Address aligned to 32B 

Capabilities: + 2. Pitch aligned to 64B 

1. NMIDIA-til-ipa/lavout-£) Capabilities: 

2--NVIDIA-FB-compression 1. Pitch-lneartayout ff} 
2. Dev2 FB compression 


Constraints: 
1. Address aligned to 32B 
2. Pitch aligned to 64B 


Capabilities: 
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CAPABILITY SET MATH EXAMPLE 


Constraints: 


1. Address aligned to 64B 
Capabilities: 


1. pitch-linear layout (*) 


Constraints: 

1. Address aligned to 32B 
2. Pitch aligned to 64B 
Capabilities: 

1. Pitch-linear layout (*) 


2. Dev2 FB-compression 


Constraints: 

1. Address aligned to 64B 
2. Pitch aligned to 64B 
Capabilities: 


1. Pitch-linear layout (*) 
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PROBLEMS ENCOUNTERED 


DEVICE ENUMERATION/CREATION/IMPORT 


Device file doesn't necessarily uniquely identify a logical device object 
Device creation from FD implies lack of need for additional /dev/file access 
Alternative of exporting devices from APIs is problematic too 


Enumeration/Correlation using UUID from Vulkan/GL APIs would provide consistency 
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NO DEVICE-LOCAL CAPABILITIES 


Ex: local caching 
GPU may have on-chip cache. When to use it? When capabilities say so of course! 
Other devices don't necessarily need to be aware of this cache usage 


Intersecting capabilities from other devices will remove this "local cache" capability 
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FORMAT SPECIFICATION 


Still an open issue that needs to be resolved. Prototype assumes RGBA8888 
Khronos Data-format spec, FOURCC, ??? 
Needs to handle HDR formats 


Should there be supported format enumeration? 
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IMPORT TO EXTERNAL APIS 


Unlike Vulkan/OpenGL import APIs, additional meta-data is needed 
How should that meta-data be packaged? DRM format modifiers not sufficient 
Does the capability set suffice? If so, see issue with device-local capabilities 


Is some level of in-kernel meta-data preferred? Limits future suballocation usage 
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RELATIONSHIP TO DMA-BUF 


Unclear if it should be required that import/export consume/produce DMA-BUF FDs 
Might bake Linux-specific assumptions into the API or usage 
Even FDs can be non-portable 


Any value in using DMA-BUF when usage is limited to a single device or driver stack? 
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NEXT STEPS 


USAGE TRANSITIONS 


Vulkan introduced the idea of explicitly transitioning between various surface uses 
Could be generalized across devices now that we can describe all usage explicitly 
Apps could query usage transitions “meta-data” from allocator for usage pairs 


That meta-data could then be passed into GPU APIs to perform transitions 
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MOTIVATION FOR USAGE TRANSITIONS 


Alternative proposal 


Justification 


Problems 


Re-allocate when usage changes 


Simpler АР! 
Steady-state is still optimal 


Allocation can be expensive 
Transitions have consistent cost 


Usage may change at inconvenient times 
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USAGE TRANSITIONS (EXAMPLE) 


// Some existing usage definitions 
extern const usage t samplingUsage; 
extern const usage t displayUsage; 


// Usage lists 
const usage t sampling[] = 4 samplingUsage }; 
const usage t samplingAndDisp[] = 

i samplingUsage, displayNVUsage }; 
const usage t dispOnly[] = 4 displayUsage }; 
void xtransitionData; 
size t transitionDataSize; 


// Query a usage transition from an allocator library device 
query transition(dev, 
ARRAYLEN(sampling), sampling, 
ARRAYLEN(samplingAndDisp), samplingAndDisp, 
&transitionDataSize, &transitionData); 
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USAGE TRANSITIONS (EXAMPLE) 


// Program the transition in Vulkan 
VkImageMemoryCrossDeviceBarrierEXT crossDeviceBarrierData = { 
VK STRUCTURE TYPE IMAGE MEMORY CROSS DEVICE BARRIER EXT, // sType 
NULL, // pNext 
transitionDataSize, // dataSize 
transitionData // data 


lc 


VkImageMemoryBarrier usageTransitionBarrier - 1 
VK STRUCTURE TYPE IMAGE MEMORY BARRIER, // sType 
&crossDeviceBarrierData, // pNext, takes precedence over oldLayout/newLayout members 


}; 


vkCmdPipelineBarrier(..., 1, &usageTransitionBarrier); 


23 AJ NVIDIA. 


WAYLAND INTEGRATION 


Getting Back to our Original Goal... 


Last year NVIDIA presented a vendor-agnostic EGL winsys client integration layer API 
The sample implementation used EGLStream, but the API is mechanism-agnostic 
Key functionality: Ability to build an EGLSurface from some lower-level primitive 


How do we build an EGLSurface from allocator surfaces? 
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WHERE DOES THE ALLOCATOR CODE GO? 


The prototype is a stand-alone library with runtime-loadable driver backends 
However, the key mechanisms could live anywhere 
Is it easier to move to this new library, merge functionality into GBM, or ??? 


If we keep the allocator library, does it need a better name than liballocator? 
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— 
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QU X и A i ы 


Any situations capability set math does not handle? 
How should device-local capabilities be handled? 
How should formats be defined? 

How should surface meta-data be represented? 

Is DMA-BUF a requirement? If so, why? 

How should EGLSurface integration work? 


Where does the allocator implementation live? 
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